1 Introduction

The CAC 40, the main stock market index of the Paris stock exchange, is a key indicator of the state of the French and European economy. Bringing together the 40 largest French market capitalizations listed on Euronext Paris, this index provides a representative synthesis of the performance of leading companies in strategic sectors such as industry, finance, and technology. It plays a central role in investor decision-making, economic cycle analysis, and assessing the competitiveness of French companies on a global scale.

In an economic context marked by the COVID-19 pandemic, rising interest rates, and geopolitical tensions, it becomes crucial to examine the key factors driving fluctuations in the CAC 40. This project aims to achieve two main objectives: first, to analyze the sectoral structure of the index to identify internal dynamics affecting its performance; and second, to study the correlations between the CAC 40 and macroeconomic variables such as interest rates and currency fluctuations.

To this end, we will combine traditional statistical approaches and machine learning tools, enriched with graphical visualizations and predictive models. This methodology will provide a comprehensive and rigorous analysis of the phenomena under study. Finally, a reflection on the practical implications of these results will be conducted, offering insights for both investors and financial analysts to inform their strategic decisions.

2 Data Import and Preparation

2.1 Raw Data

The files used contain economic, financial, and stock market information necessary to analyze variations in the CAC 40. Below is an overview of the files and their contents:

  1. cac40_composition.csv: List of companies composing the CAC 40, including their sectoral and financial characteristics.
  2. euribor_3m_rate.csv: Historical data on the 3-month Euribor rates.
  3. france_unemployment_rate.csv: Annual data on the unemployment rate in France.
  4. daily_data.csv: Daily data on the evolution of the CAC 40, S&P 500, the euro/USD exchange rate, and other financial variables.

The Euribor rates and unemployment rate were not integrated into the daily_data table, as they are reported at monthly and annual frequencies, respectively, unlike the variables in daily_data.csv, which are recorded on a daily basis.

Dataset Dimensions
File Name Number of Rows Number of Columns
cac40_composition 40 11
euribor_3m_rate 371 2
france_unemployment_rate 49 2
daily_data 8767 5

2.2 Variable Typing

The next step is to adjust the variables based on their meaning. Most of them are imported as character strings (chr). Therefore, it is necessary to convert dates to the Date format, numbers to numeric, etc.

After adjusting the data types to ensure consistency with the intended analyses, we provide below a detailed description of the variables present in each dataset, specifying their role and meaning within the scope of this study.

Variable Description
Dataset Variable Name Type Description
cac40_composition company_name character Name of the listed company
cac40_composition sector character Company’s business sector
cac40_composition stock_price numeric Company’s stock price
cac40_composition Ticker character Company’s ticker symbol (stock symbol)
cac40_composition shares_outstanding_millions numeric Number of shares in millions
cac40_composition capitalization_in_billion numeric Market capitalization in billions
cac40_composition proportion_ds_indice numeric Weight in the CAC 40 index
cac40_composition dividend_2023_in_millions integer Dividends paid in 2023 (in millions)
cac40_composition part_dividende_2023 numeric Proportion of total dividends in 2023
cac40_composition stock_price_5jan numeric Stock price on January 5th
cac40_composition stock_price_30dec numeric Stock price on December 30th
euribor_3m_rate EURIBOR_3M_Rate numeric Recording date
euribor_3m_rate Date Date 3-month Euribor rate
france_unemployment_rate Date Date Recording year
france_unemployment_rate unemployment_rate numeric Annual unemployment rate
daily_data Date Date Recording date
daily_data euro_usd numeric EUR/USD exchange rate
daily_data sp500_price numeric S&P 500 index price
daily_data cac40_price numeric CAC 40 index price
daily_data urw_price numeric Unibail-Rodamco-Westfield stock price

2.3 Managing Missing Values

The data used in this study was collected over a 23-year period with daily granularity, limited to business days. While necessary to reflect financial market dynamics, this characteristic inevitably results in missing values for certain days, particularly due to public holidays or periods without trading.

To maintain the integrity of the time series and avoid significant data loss, we opted for linear interpolation of missing values. This method estimates missing values based on adjacent data, ensuring continuity in the analysis while minimizing potential distortions.

This methodological choice is justified by the length of the study period (23 years), which helps mitigate the impact of estimates on overall conclusions. Moreover, systematically removing rows with missing values would have significantly reduced the volume of usable data, risking biased results or limiting the scope of the analysis.

Variable Number of Missing Values
Date 0
euro_usd 0
sp500_price 0
cac40_price 0
urw_price 0

3 Sector Composition and Characteristics of the CAC 40

3.1 Sectoral Distribution

En tant qu’indice de référence de la place de Paris, le CAC 40 regroupe les quarante principales entreprises cotées sur Euronext Paris. La composition sectorielle de cet indice met en lumière une forte concentration autour de quelques secteurs spécifiques, en faisant un reflet pertinent des dynamiques économiques françaises.

The chart above illustrates the sectoral distribution of the CAC 40, showing a significant predominance of the consumer sector (39.6%) and financial services (22.3%). This structure reflects the central role of these sectors in the French economy, both nationally and internationally. In contrast, the technology sector, while strategic, represents a modest 4.8%, highlighting its still limited weight compared to other international indices, such as the Nasdaq, which includes major companies like Google, Apple, Microsoft, and Meta.

Regarding profit redistribution, the donut chart highlights the significant contribution of the financial sector (29.8%) and the consumer sector (18%) to dividends paid to shareholders. This contribution contrasts with their relative weight in the index, underscoring their predominant role in creating value for investors. Conversely, the technology and utilities sectors are characterized by more modest contributions, suggesting a strategic focus on reinvestment.

The joint analysis of sectoral composition and dividend distribution sheds light on the performance drivers of the CAC 40. It also reveals the differentiated strategic choices of companies by sector—between maximizing shareholder value and prioritizing long-term development.

3.2 Importance of Companies in the Index

The companies that make up the CAC 40 show highly heterogeneous weights, resulting in varied contributions to the weekly fluctuations of the index. A detailed analysis of relative contributions shows that certain companies, particularly large-cap firms such as LVMH or TotalEnergies, exert a disproportionately strong influence on the overall performance of the index.

A mosaic chart representing the contribution of companies highlights this dynamic. This visualization clearly shows that the 5 to 10 most influential companies are often responsible for a significant portion of the variations observed in the CAC 40.

To complement this analysis, a bar chart allows for a more precise quantification of individual contributions. This representation provides a useful tool for identifying companies whose impact, while less visible at first glance, remains significant in weekly fluctuations.

3.3 Regional Contributions and Sectoral Variations

The performance of CAC 40 companies cannot be isolated from the economic dynamics of the regions in which they operate. A geographic analysis of regional contributions in 2023 highlights the dominant influence of European and North American markets. The United States, in particular, benefits from strategic trade agreements that facilitate market access for many groups, strengthening its contribution to the index’s overall revenues. Moreover, Asian regions play a significant role, especially in the luxury sector, where they remain a key driver of growth for groups like LVMH and Hermès. This importance is also reflected in the international activities of large companies such as L’Oréal and Airbus. These economic zones capture a substantial share of the revenues generated by the index’s companies, illustrating the fundamental role of global dynamics in their results.

To further refine the analysis, a Ridgeline plot of weekly variations by sector offers an overview of the specific distributions for each industry. For instance, the technology sector stands out for its increased volatility, while the consumer and financial services sectors show distributions more concentrated around their mean. These characteristics may reflect differentiated investment behaviors or varying sector sensitivities to economic shocks.

These combined visual representations reveal the complexity of interactions between companies, sectors, and regions in the overall performance of the CAC 40. They provide key insights for identifying performance drivers and strategic opportunities for investors.

4 Variations and Performance of the CAC 40

4.1 Analysis of Time-Based Variations

Stock market indices, especially the CAC 40, react strongly to major economic events that significantly influence their returns. A comparative analysis of returns before and after these events highlights characteristic variations depending on the type of crisis. For example, the global financial crisis caused significant losses followed by a gradual recovery.

The study of daily return distributions provides valuable insights into the underlying dynamics of these variations. The histogram of daily returns for 2023 illustrates a symmetric distribution centered around zero, indicating overall stability. However, the presence of fat tails in this distribution highlights an increased frequency of extreme returns, often caused by exogenous shocks.

The assumption of normality, verified using a QQ-plot, reveals that although CAC 40 returns generally follow a normal trend, significant deviations appear in the tails. These deviations underscore the index’s sensitivity to extreme events, a characteristic of a slightly leptokurtic distribution. This observation is crucial for financial risk management and return modeling, where accounting for rare but impactful events is essential.

These analyses demonstrate the importance of integrating robust and tailored methodologies to understand the specific characteristics of volatility and observed distributions in financial markets.

4.2 Study of Daily and Weekly Returns

The analysis of daily returns in 2023 highlights a complex dynamic, marked by significant fluctuations at times. A graphical representation in the form of a heatmap of daily variations clearly illustrates this volatility: periods of sharp decline (marked in red) are mainly concentrated in the first quarter, often linked to global macroeconomic uncertainties. In contrast, positive peaks (in green) indicate rebound periods, generally associated with favorable announcements or market adjustments.

To deepen the analysis, a monthly boxplot of daily returns reveals significant dispersion in the first half of the year, with fat tails in the histogram indicating high volatility. In contrast, the second half is characterized by a gradual stabilization, with returns whose medians converge toward zero, reflecting a certain calm.

These observations highlight a generally symmetric behavior of returns, but punctuated by extreme values, requiring increased vigilance from investors. The distribution of returns, while approaching a normal curve, remains influenced by exogenous shocks, emphasizing the importance of integrating these rare events into financial models.

4.3 Volatility and Returns Around Specific Events

Major exogenous events, such as Brexit or the 2008 financial crisis, induce significant disruptions in the returns and volatility of the CAC 40 index. Analyzing these periods helps decipher how financial markets respond to uncertainty and reveals the underlying dynamics.

The evolution of average daily returns around Brexit clearly illustrates the impact of this event on the markets. Before the official announcement (blue dashed line), returns were characterized by relative stability, with moderate variations. However, the Brexit announcement triggered a sharp drop followed by a rapid rebound. This behavior reflects heightened short-term volatility, typical of periods of major political uncertainty. This phenomenon, amplified by portfolio adjustments in response to unexpected scenarios, underscores the importance of economic agents’ anticipations.

Regarding the long-term evolution of volatility, the boxplot of monthly volatilities of CAC 40 returns highlights major peaks during global crises, particularly in 2008 (global financial crisis) and 2020 (COVID-19 pandemic). These periods of extreme turbulence resulted in amplified return variations, direct reflections of economic uncertainty and massive portfolio adjustments by investors. Conversely, the years following these crises show a gradual stabilization trend in volatility levels, indicating a progressive return to market normality.

These analyses reveal that, although generally stable, financial markets remain vulnerable to unforeseen exogenous shocks. Integrating these risks into investment strategies is essential, requiring a rigorous assessment of the intensity and duration of each event. These elements are fundamental prerequisites for developing robust models capable of capturing future fluctuations and mitigating the effects of volatility.

5 Relationships with Other Economic Variables

5.1 Macroeconomic Factors

The analysis of the relationship between CAC 40 index returns and 3-month Euribor rates reveals a slightly negative correlation (-0.13), highlighted by the downward slope of the regression line. This result contradicts the expected positive correlation during periods of economic recovery, where rising Euribor rates reflect strengthening economic activity. Here, the weak negative correlation suggests that the impact of Euribor rates on CAC 40 returns is marginal in a broader context.

## [1] "Correlation coefficient: -0.13"

However, segmenting the analysis by economic cycles reveals important nuances. During crises, such as the 2008 recession or the European sovereign debt crisis (2011-2012), the influence of Euribor rates on returns becomes more pronounced. This can be explained by accommodative monetary policies during crises that lower Euribor rates, supporting liquidity and boosting investor confidence. In contrast, during post-crisis phases, the relationship becomes less distinct, as investors focus more on economic fundamentals and corporate performance.

These results highlight the complexity of interactions between monetary policies and financial markets. They also emphasize the importance of considering the economic context and market cycles to understand how interest rates impact stock indices. This approach helps refine investment strategies and anticipate the future evolution of the CAC 40 in various economic environments.

5.2 Relationship with Currencies

The relationship between the EUR/USD exchange rate and CAC 40 index variations shows a negative correlation of -0.24, indicating that an appreciation of the euro against the U.S. dollar tends to be associated with a decline in the French stock index. This phenomenon is based on fundamental economic mechanisms affecting companies in the CAC 40.

On one hand, a rising euro reduces the competitiveness of French exports, especially in markets where the dollar is the reference currency, such as North America. Large exporting companies like Airbus and LVMH see their products become more expensive for local consumers, potentially reducing their market share. On the other hand, a strong euro also decreases profits earned in dollars when converted to euros, affecting multinationals that generate a significant portion of their revenues internationally.

During the 2008 financial crisis, for example, the euro reached 1.6 USD/EUR, coinciding with a sharp decline in the CAC 40 index, exacerbated by a contraction in global demand. Conversely, during the European sovereign debt crisis (2011-2012), a depreciation of the euro boosted French exports, strengthening the performance of CAC 40 companies.

However, the graphical analysis also reveals significant dispersion in the data points, reflecting the influence of other factors such as macroeconomic conditions, monetary policies, or sector-specific dynamics. Thus, while the exchange rate is a crucial factor, it must be considered within a more complex framework where multiple dynamics affect corporate and stock index performance.

5.3 Correlation Matrix

The correlation matrix is an essential analytical tool, providing a synthetic view of linear relationships between key economic and financial variables influencing the evolution of the CAC 40 index. The analysis reveals several significant trends, notably a strong positive correlation with the S&P 500 index. This connection reflects the interconnectedness of global stock markets, particularly during periods of crises or global economic recoveries, such as the 2008 financial crisis or the 2020 pandemic. These events illustrate how major indices synchronize in response to macroeconomic shocks.

In contrast, the negative correlation observed between the CAC 40 and the EUR/USD exchange rate reflects an inverse dynamic. An appreciation of the euro, as an unfavorable factor for French exporting companies, tends to weigh on the index’s performance. This phenomenon can be explained by reduced price competitiveness for French companies in international markets, directly affecting their profit margins.

Furthermore, a moderate correlation between 3-month Euribor rates and the CAC 40 highlights the indirect influence of interest rate variations on investor expectations. The European Central Bank’s (ECB) monetary policies play a central role here: for example, the rate cuts in 2012, aimed at supporting the recovery after the sovereign debt crisis, helped boost European stock markets.

Finally, a significant negative correlation is identified between the CAC 40 and the unemployment rate. High unemployment, an indicator of unfavorable economic conditions, is often associated with weaker stock performance, reflecting stagnant domestic demand, declining corporate profits, and eroded investor confidence. This link was particularly evident during the economic tensions in the eurozone.

These correlations illustrate the complexity of interactions between economic and financial variables. They reinforce the importance for investors to take these relationships into account in their analyses to anticipate index movements in various economic contexts.


6 Sector Composition and Characteristics of the CAC 40

6.1 Introduction to the Methodology

This analysis aims to predict CAC 40 index variations using a simple linear regression model based on key macroeconomic variables, namely:

  • The EUR/USD exchange rate
  • Variations in the S&P 500 index
  • The unemployment rate
  • The 3-month Euribor rate

These explanatory variables were chosen for their economic relevance and proven correlation with CAC 40 fluctuations, as demonstrated by previous financial market research. The main objective is to quantify the influence of these factors on CAC 40 prices and evaluate the feasibility of generating reliable predictions to anticipate future trends.

6.2 Data Presentation and Preprocessing

To ensure the quality of the data used in this analysis, a rigorous preprocessing process was conducted:

1) Handling Missing Values: An initial inspection confirmed the absence of significant missing values, ensuring the robustness of the results.

2) Selection of Numerical Variables: Only relevant quantitative variables were retained for the analysis, as illustrated in Figure 1, which displays the distribution of the variables.

The graphs highlight several characteristics of the explanatory variables and the target variable (CAC40_Price):

  • A multi-modal distribution for the CAC40 and S&P 500, reflecting periods marked by crises or rapid economic recoveries.

  • A notable concentration of EUR/USD rates around 1.2, indicating relative stability over the studied period.

  • Significant heterogeneity in 3-month Euribor rates, illustrating the impact of monetary policies over time.

These preliminary observations justify the integration of these macroeconomic variables as the foundation for the predictive model of the CAC 40.

6.3 Correlation Analysis

Before constructing the linear regression model, a correlation matrix was calculated to explore the relationships between the explanatory variables and the CAC 40. The results highlight several significant relationships:

  • A strong positive correlation with the S&P 500 index, confirming the interdependence of global financial markets.

  • A moderate correlation with the EUR/USD rate, reflecting the impact of currency fluctuations on the companies within the index.

  • A negative correlation with the 3-month Euribor rate, indicating an inverse influence of credit conditions on stock performance.

  • A weak correlation with the unemployment rate, suggesting that this indicator has a limited short-term impact on index variations.

These results confirm the relevance of the selected explanatory variables for modeling while also highlighting potential limitations related to variables with weak correlations to the CAC 40.

6.4 Construction of the Linear Regression Model

To analyze variations in the CAC 40 index, we specified a linear regression model based on the following formula:

\[ CAC40\_Price \sim EUR\_USD\_Price + SP500\_Price + taux\_chomage + EURIBOR\_3M\_Rate \]

This choice is based on the economic relevance of the selected explanatory variables:

  • EUR/USD: An indicator of the price competitiveness of French exporting companies and the impact of exchange rate fluctuations on their financial results.
  • S&P 500: A reference for international financial markets, reflecting global trends.
  • Unemployment rate: An indirect measure of economic health and domestic consumption.
  • 3-month Euribor rate: A proxy for credit and liquidity conditions in the European market.

The estimated coefficients reveal several significant relationships:

  • S&P 500: A strongly positive correlation, indicating that the CAC 40 closely follows U.S. market trends.
  • EUR/USD: A moderately positive correlation, suggesting that an appreciation of the euro may sometimes be accompanied by increases in the CAC 40.
  • 3-month Euribor rate: A negative correlation, consistent with the idea that higher credit costs hinder the growth of listed companies.
  • Unemployment rate: A weak influence, corroborating its initially low correlation with the index.

While the model captures major linear relationships, it does not fully account for the complex dynamics characterizing the market.

6.5 Cross-Validation and Performance Evaluation

6.5.1 Cross-Validation

To assess the robustness of the model, cross-validation was conducted. The data was split into two subsets:

  • Training (80%): Used to adjust the coefficients.
  • Test (20%): Used to evaluate predictions on independent data.

6.5.1.1 Performance Metrics

The model’s performance was evaluated using several metrics:

## # A tibble: 3 × 3
##   .metric .estimator .estimate
##   <chr>   <chr>          <dbl>
## 1 rmse    standard     488.   
## 2 rsq     standard       0.772
## 3 mae     standard     407.
  • RMSE (Root Mean Square Error): 488.08, reflecting the average quadratic error.
  • R² (Coefficient of Determination): 0.771, explaining 77.1% of the variance in CAC 40 prices.
  • MAE (Mean Absolute Error): 407.01, representing the mean absolute error.

These results indicate satisfactory performance, although the model still has room for improvement, particularly in reducing errors during periods of high volatility.

6.5.2 Comparison of Actual and Predicted Values

The graph comparing actual and predicted values illustrates the performance of the linear regression model for the CAC 40. The main observations are as follows:

  • General Trends Well Captured:
    The model effectively reproduces the major dynamics of the index, particularly during prolonged phases of growth or decline.

  • Localized Deviations:
    Certain divergences appear during periods of high volatility, such as financial crises or peaks of uncertainty. For instance, abrupt fluctuations during crisis periods are not fully anticipated by the model.

  • Moderate Accuracy in Stable Periods:
    In the absence of marked volatility, the model’s predictions tend to converge more closely with observed values, indicating robustness under normal market conditions.

These results demonstrate the model’s ability to provide reliable medium-term predictions while revealing its limitations when faced with sudden and extreme variations.

6.5.3 Observed Limitations

Several inherent limitations of the model have been identified:

  • Linearity Assumption:
    • Linear regression does not model non-linear relationships or complex dynamics effectively.
  • Variable Selection:
    • The absence of variables such as volatility indicators or market sentiment reduces explanatory power.
  • Crisis Periods:
    • Prediction accuracy decreases significantly during major economic events.

6.6 Perspectives for Improvement

Introduction of New Variables
To refine predictions, new explanatory variables could be added:

External Variables:
- Commodity prices (oil, gold)
- Volatility indicators, such as the VIX index
- Detailed sectoral data, including the performance of key sectors in the CAC 40

Internal Variables:
- Foreign capital flows
- Earnings announcements of companies within the index

Exploration of Advanced Models
Testing non-linear approaches to better capture the complexity of relationships between variables:

  • Random Forests
  • Simple or recurrent neural networks (MLP, LSTM)
  • Boosting models (XGBoost, LightGBM)

These techniques may better capture complex and non-linear relationships between explanatory variables and the CAC 40.

Time Series-Specific Modeling
Adopting models tailored to temporal data:

  • ARIMA models to analyze trends and seasonal effects
  • VAR models (Vector AutoRegressive) to study dynamic interactions between multiple indices and macroeconomic variables
  • Hybrid models combining time series analysis and machine learning

7 Conclusion

This project highlighted the factors influencing CAC 40 fluctuations through an approach combining statistical tools, graphical analyses, and machine learning models. The study of correlations between the CAC 40 and key macroeconomic variables, such as the EUR/USD exchange rate, the S&P 500, 3-month Euribor rates, and the unemployment rate, revealed complex interactions reflecting global economic and financial dynamics.

The linear regression model demonstrated a notable ability to capture the index’s general trends under stable market conditions. However, it remains limited when facing abrupt variations or high volatility contexts, offering opportunities for future improvements.

The results of this project provide interesting perspectives for practical applications. For example, developing an interactive model could allow for simulating hypothetical scenarios, such as a sudden rise in Euribor rates or significant variations in the S&P 500. Integrating these features into an R Shiny interface would enable real-time prediction exploration, enhancing their usefulness for investors.

Additionally, predictions from this model could be used to optimize investment strategies. By combining these predictions with fundamental analyses, it would be possible to design active portfolio management approaches based on anticipated signals and robust models.

For future research, the project offers pathways to deepen the analysis. Extending the methodology to other international indices, such as the DAX or the FTSE, could lead to enriching comparisons and a better understanding of the specificities of different markets. Exploring multivariate models that consider interactions between multiple indices and macroeconomic variables could also improve predictions by capturing more complex global dynamics.

In conclusion, this project provides a solid foundation for understanding CAC 40 variations and anticipating its future trends. The identified improvement perspectives—especially through the introduction of new explanatory variables and the use of advanced models—pave the way for even more precise and useful analyses. These efforts will help develop powerful predictive tools that assist investors in navigating an ever-evolving economic and financial environment.

8 References